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Coelacanths are known as "living fossils," as they show remarkable morphological resemblance to the fossil record and 
belong to the most primitive lineage of living Sarcopterygii [lobe-finned fishes and tetrapods). Coelacanths may be key to 
elucidating the tempo and mode of evolution from fish to tetrapods. Here, we report the genome sequences of five 
coelacanths, including four Latimeria chalumnae individuals [three specimens from Tanzania and one from Comoros) and one 
L menadoensis individual from Indonesia. These sequences cover two African breeding populations and two known extant 
coelacanth species. The genome is -2.74 Gbp and contains a high proportion (-60%) of repetitive elements. The genetic 
diversity among the individuals was extremely low, suggesting a small population size and /or a slow rate of evolution. We 
found a substantial number of genes that encode olfactory and pheromone receptors with features characteristic of 
tetrapod receptors for the detection of airborne Iigands. We also found that limb enhancers of bmp7and gH3, both of which 
are essential for limb formation, are conserved between coelacanth and tetrapods, but not ray-finned fishes. We expect that 
some tetrapod-Iike genes may have existed early in the evolution of primitive Sarcopterygii and were later co-opted to 
adapt to terrestrial environments. These coelacanth genomes will provide a cornerstone for studies to elucidate how 
ancestral aquatic vertebrates evolved into terrestrial animals. 



[Supplemental material is available for this article.] 

Since Agassiz (1844) first described the coelacanths, their fossils 
have been found frequently in sediments from the Early Devonian 
to the Late Cretaceous periods, implying that they were successfully 
diversified in the past. However, the disappearance of coelacanths 
in the fossil record after the Late Cretaceous period led biologists to 
believe that coelacanths had died out during the mass extinction 
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event around that period (65 million years ago [Ma]). Therefore, 
the discovery of the first living coelacanth, Latimeria chalumnae, 
off the coast of South Africa in 1938, created a sensation not only 
within the scientific community but also within the general public 
(Smith 1939). At present, coelacanths are called an "evolutionary 
relic" or a "living fossil" because their morphology is basically 
unchanged from that of the fossil record (Smith 1939). After the 
discovery of a second living coelacanth in the Comoros archipel- 
agos (Smith 1953), the existence of a viable coelacanth population 
was confirmed in this area. In addition to the Comoros archi- 
pelagos, several coelacanths have been captured off the coasts of 
Mozambique (Schliewen et al. 1993), Madagascar (Heemstra et al. 
1996), Kenya (De Vos and Oyugi 2002), and Tanzania (Sasaki et al. 
2007). Nikaido et al. (2011) recently demonstrated that a coela- 
canth population off the northern coastal region of Tanzania is 
genetically distinct from that of Comoros, indicating that the 
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northern coastal region of Tanzania is the second habitat of coela- 
canths in the western Indian Ocean. Furthermore, two coelacanth 
individuals (L. menadoensis) were also captured off the coast of 
Manado, Sulawesi, Indonesia (Erdmann et al. 1998), on the opposite 
side of the Indian Ocean (the locations of the captured and observed 
coelacanths are summarized in Fig. 1; see also Supplemental Fig. 1). 

The availability of coelacanth specimens enabled us to carry 
out genomic studies on these exceptional species (Supplemental 
Table 1). Coelacanths belong to the Sarcopterygii (lobe-finned 
fishes and tetrapods) (Forey 1988; Supplemental Fig. 2), in which 
lungfishes are also included. Indeed, molecular phylogenetic 
studies clearly indicate that coelacanths and lungfishes are more 
closely related to tetrapods than to teleost fishes (e.g., Zardoya and 
Meyer 1996), whereas the branching order of coelacanths, lung- 
fishes, and tetrapods is still controversial (Takezaki et al. 2004). The 
karyotype of L. chalumnae, which was reported to be 48 chromo- 
somes including microchromosomes, is similar to that of frogs and 
other species such as turtles or birds, further supporting the phy- 
logenetic closeness of the coelacanth with tetrapods (Bogart et al. 
1994). Accordingly, the coelacanths may fill an evolutionary gap 
between fish and tetrapods. Considering that the lungfishes are 
not suitable for comparative genomic analysis because of their 
extremely large genomes (Gregory et al. 2007), the coelacanth is 
a practical choice for genome-wide analysis. 

One of the most conspicuous evolutionary events of 
Sarcopterygii is the transition from water to land, during which 
a variety of organs were subject to change due to adaptation to 
a novel environment. For example, the olfactory organ of the ex- 
tant land vertebrates detects airborne chemicals, whereas that of 
fish primarily detects water-soluble chemicals. Thus, an innovative 
change occurred in the olfactory organ of vertebrates during the 
habitat transition from water to land. Similarly, the robust endo- 
skeletal structures observed in land vertebrates are believed to be 
a result of adaptation to terrestrial life (Coates et al. 2002). In- 
vestigation of such phenotypic alterations is quite important to 
elucidate how adaptation to terrestrial life was accomplished during 
evolution. However, the molecular mechanisms underlying such 



transitions are unknown. To elucidate at the molecular level the 
evolutionary trajectories of vertebrates from water to land, we 
determined the whole-genome sequences of five coelacanths and 
performed an extensive comparative genomic analysis from various 
perspectives. 

Results 

Assembling the coelacanth genome 

First, we constructed the reference coelacanth draft genome from 
one of the Tanzanian specimens (TCC04 1-004, gender unknown) 
(Nikaido et al. 2011), which was recovered from the body cavity of 
its mother (coelacanths give birth to fully formed offspring) (Fig. 2). 
A micro-computed tomography (micro-CT) scanning image was 
taken before sampling (Fig. 2; Supplemental Fig. 3). In total, we 
generated 884.8 Gbp of raw sequence data, from which —780 Gbp 
(—300 X coverage) was used for the assembly using the newly 
developed assembler PLATANUS (Supplemental Information 2.3, 
2.4, 2.5). The genome size was estimated to be 2.74 Gbp from the 
k-mer analysis (Supplemental Fig. 5; Supplemental Table 3). 

Unique features of the coelacanth genome 

Compared with the typical teleost fish, which has a genome size 
of ~1 Gbp (Hinegardner and Rosen 1972) (1 pg in C-values), the 
coelacanth genome is large (2.74 Gbp). We found that —60% of the 
coelacanth genome consists of repetitive elements — including 
simple repeats, low-complexity regions, and small RNAs — which is 
higher than the corresponding percentage in frog (35%), chicken 
(9%), and mammalian (40%-50%) genomes (Supplemental Fig. 17). 
Thus, the abundance of repetitive elements may explain the rela- 
tively large genome in coelacanths. Transposable elements (TEs) 
also have a considerable impact on the nucleotide composition of 
the coelacanth genome. The GC content of the entire coelacanth 
genome is 42%, which is comparable to that of other terrestrial 
vertebrate genomes (41% in human, 42% in chicken, and 40% in 
frog). However, it is the TE regions in coelacanths that have a high 
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Figure 1 . Captured or observed coelacanth individuals. The location numbers indicate the order of the captures. The location names and the dates are 
summarized at the right of the map. Although most of the coelacanths were recorded in the western Indian Ocean, some coelacanths were also captured 
and observed off the coast of Manado, Sulawesi. The names of the key African and Indonesian countries are indicated as follows: Kenya (Ken.), Tanzania 
(Tan.), Mozambique (Moz.), Madagascar (Mad.), South Africa (SAf.), Indonesia (Ind.), and the Philippines (Phi.). 
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Figure 2. Overview of the Tanzanian coelacanth (L chalumnae). (A) Photograph of the whole body of 
the juvenile individual (ID: TCC041 -004). (B) Micro-CT image of the pelvic fin of the juvenile coelacanth 
specimen before the dissection. 



GC content (average of 45%), whereas the non-TE regions have 
only a 37% GC content, in contrast to 40% or higher in other 
vertebrate genomes. The number of CpG islands (CGIs) is ex- 
tremely high (>90,000) because CGI-containing TEs are spread 
throughout the coelacanth genome. The number of CGIs in non- 
TE regions of the coelacanth genome was only 13,319, which is 
lower than that of human and chicken (23,000-28,000) but 
comparable to that of frog (15,000) and zebrafish (13,000). 

TEs in the coelacanth genome 

Most of the repetitive sequences that account for —60% of the 
coelacanth genome were characterized as TEs and classified ac- 
cording to their types. Within coelacanths, 23% of the genome is 
made up of DNA transposons, and 26% is made up of retroposons 
comprising SINEs (13.6%), LINEs (10.6%), and LTR retro- 
transposons (2.2%) (Table 1; Supplemental Fig. 17). The age dis- 
tribution of TEs reveals that all the TE classes contain both highly 
divergent copies (>35% sequence divergence from the consensus) 
and copies that have low divergence (considered young; <5% di- 
vergence), suggesting that they have transposed or retrotransposed 
during both early and recent evolution (Fig. 3). In mammalian 
genomes, TEs that transposed or retrotransposed >100-150 Ma 
have been detected (International Human Genome Sequencing 
Consortium 2001). Because the substitution rate in coelacanths is 
considerably lower than the rates in other vertebrates (see below) 
(Amemiya et al. 2010; Higasa et al. 2012), it is likely that we could 
find TEs that were inserted even earlier, possibly >250 Ma. 

The coelacanth LINEs comprise at least 10% of the genome. 
Some copies of CR1, as well as those of L2, diverge from their re- 
spective consensus sequences by as little as 1% to >35%, which 
suggests that both families of LINEs have been retrotransposition- 
ally active for hundreds of millions of years. In addition to CR1 and 
L2, seven LINE clades (Penelope, LI, Txl, RTE, Vingi, Dong [R4], and 
R2) are present in the coelacanth genome (Fig. 4). The number of 
LINE clades in coelacanths is higher than that in tetrapods such as 
mammals (at most five clades), birds (one clade), anole lizard 
(seven clades), and frogs (five clades). 

Autonomous and nonautonomous DNA transposons consti- 
tute 23% of the coelacanth genome, and, notably, Harbinger ele- 
ments (LatiHarbl) (Smith et al. 2012) occupy 9.3% of the genome. 
There are dozens of diverse families and subfamilies as well as non- 
autonomous elements related to theifarMn^ersuperfamily. Although 
Harbinger elements are also found in other vertebrates such as fishes 
and frogs (Kapitonov and Jurka 2004; Hellsten et al. 2010), their high 
level of diversity, as well as their high number in the coelacanth 
genome is unique (see Supplemental Information 5 for details). 



The evolutionary rate of the coelacanth 
genome has been debated for many years 
(Noonan et al. 2004; Amemiya et al. 2010; 
Higasa et al. 2012). With the entire ge- 
nome available, we first compared a set of 
5247 orthologous genes for coelacanth 
and four other vertebrates — human, 
chicken, frog, and zebrafish — and then 
constructed a maximum-likelihood tree 
to evaluate the difference in evolutionary 
rates among vertebrates (Fig. 5 A). We 
found that the branch length of the coe- 
lacanth lineage is significantly shorter 
than the other branches in all substitution 
models applied (Supplemental Table 15). In addition, the likeli- 
hood-ratio test confirmed that the branch length for coelacanths 
is significantly shorter than those of the three other vertebrates 
(P < 0.01; Supplemental Information 6.1). Our data provide ge- 
nome-wide confirmation of previous studies that used a small 
number of selected gene sets (Noonan et al. 2004; Amemiya et al. 
2010; Higasa et al. 2012). 

We further determined the entire genome sequence of 
L. menadoensis and mapped the sequence reads to the reference ge- 
nome to compare the whole-genome sequences between the two 
coelacanth species (L. chalumnae and L. menadoensis). Surprisingly, 
the genetic divergence of the nuclear genome between them was 
estimated to be only 0.18% (Fig. 5B), which is at the subpopulation 
level. Although the slow rate of substitution was also recently 
reported elsewhere (Amemiya et al. 2013), the previous inves- 
tigators compared the transcriptomes from only two tissues and 

Table 1. Composition of TEs in the coelacanth genome 



Copies ( x 1 0 3 ) Length (kb) 



[Percentage OR 
fraction (%)] 



SINEs 


1171.4 


359,325 


13.56 


LINEs 


490.2 


280,452 


10.58 


LI /Txl 


62.9 


59,508 


2.25 


L2 


122.9 


84,581 


3.19 


CR1 


193.7 


102,870 


3.88 


RTE 


67.1 


23,156 


0.87 


Vingi 


5.0 


3,500 


0.13 


Dong(R4) 


0.2 


217 


<0.01 


R2 


<0.1 


57 


<0.01 


Penelope 


38.4 


6,618 


0.25 


LTR retrotransposons 


56.0 


58,466 


2.21 


Gypsy 


7.3 


5167 


0.20 


ERV1 


5.9 


1513 


0.06 


Unclassified 


9.4 


1755 


0.07 


DIRS 


33.4 


50,032 


1.89 


DNA transposons 


1,834.9 


608,983 


22.98 


Harbinger 


344.9 


246,900 


9.32 


hAT/Charlie 


90.5 


18,045 


0.68 


Mariner/Tel 


12.0 


5264 


0.20 


Kolobok 


15.5 


3116 


0.12 


piggyBac 


0.2 


41 


<0.01 


Polinton 


0.4 


204 


<0.01 


Helitron 


7.8 


3103 


0.12 


Nonautonomous 


1280.7 


294,713 


11.12 


elements 








Unclassified 


83.0 


37,597 


1.42 


Unclassified TEs 


130.3 


24,838 


0.94 


Total 


3682.9 


1,332,065 


50.27 



1 742 Genome Research 



www.genome.org 



Whole-genome sequencing of five coelacanths 




0.189%) (Scally et al. 2012), are also con- 
sistent with the idea of a lower nucleotide 
substitution rate in coelacanths. 
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Figure 3. Age distribution of TEs in the coelacanth genome. The total length of each TE class is shown 
against the sequence divergence (%) from the consensus sequence. 



partial genome sequences of L. menadoensis to the L. chalumnae 
genome. In the present study, we report the significantly slower rate 
of nucleotide substitution in the nuclear genome, which was de- 
termined by a whole-genome comparison of the two coelacanth 
species. In addition, we calculated the KJK S ratio between the two 
coelacanth species using a total of 4531 coding DNA sequences. The 
resulting KJK S ratio was estimated to be 0.38 (Supplemental In- 
formation 6.2), which is even higher than that for the other verte- 
brate species. Therefore, the low rate of amino acid substitution in 
coelacanths cannot be explained by purifying selection (smaller K a 
as compared with K s ). Thus, the present result suggests that both the 
K a and K s were small, indicating a slow rate of nucleotide sub- 
stitution in the coelacanth genome. 

It is worth noting that the nuclear genetic divergence of 0.0018 
shown here is 23 -fold smaller than that of the L. menadoensis and 
L. chalumnae mitochondrial genomes (0.0428) (Inoue et al. 2005; 
Saitoh et al. 2011). The difference between the nuclear and mito- 
chondrial genomes could be of primary importance in discussing 
the evolution of the coelacanth genome. If we simply integrate the 
divergence time of 20-30 Ma, which was estimated by the mito- 
chondrial genome analyses (Inoue et al. 2005; Saitoh et al. 2011), 
the nuclear substitution rate was calculated to be 0.03-0.045 X 10" 9 
per year. This value is lower than that of other vertebrates (e.g., 1 .2 X 
10~ 9 per year in the human-chimp pair, as calculated by their ge- 
netic distance and divergence time of 0.0144 from Watanabe et al. 
2004 and 6 Ma, respectively). Thus, several lines of evidence suggest 
that the nucleotide substitution rate in the nuclear genome of the 
coelacanths was unexpectedly slow. 



Genes for limb development 

Because terrestrialization was an impor- 
tant event during the evolution of verte- 
brates, and because coelacanths have 
been historically regarded as a missing 
link to that event, we looked for genes 
that are expected to be associated with 
terrestrialization. First, we looked at genes 
related to lobed fins. The lobed fins of the 
coelacanth exhibit structures that are in- 
termediate between fish and tetrapods, as 
represented by the presence of ray-like 
dermal bones (lepidotrichia), as well as 
a tetrapod-like robust endochondral internal skeleton, which are 
the ancestral characteristics of primitive sarcopterygians (Figs. 2, 
6A; Coates et al. 2002; Friedman et al. 2007). The and genes encode 
actinoidin proteins, which are essential for the formation of 
lepidotrichia in teleost fishes and are absent in tetrapods (Zhang et al. 
2010). Gene-knockdown experiments in zebrafish suggest that the 
loss of and genes in the tetrapod lineage led to the fin-to-limb 
transition (Zhang et al. 2010). We found two intact putative and 
genes (and_a and and_b) in the coelacanth genome. Both genes 
possess the conserved domain at the N-terminal region (Fig. 6B) 
and the repeat regions (Supplemental Fig. 28), both of which are 
characteristic of and gene family members. This discovery is con- 
sistent with the presence of actinotrichia (fiber-like proteins that 
are observed when the lepidotrichia first form) in coelacanths 
according to the anatomical description (Geraudie and Meunier 
1980), and it demonstrates at the DNA level the retention of 
plesiomorphic fish-like characteristics (see Supplemental Infor- 
mation 7.2 for details). 

We also explored conserved noncoding elements (CNEs) 
that act as enhancers of key genes for limb development such as 
bmp7, greml, shh, and gli3. These CNEs participate in gene regu- 
latory networks for axial formation, outgrowth, and chondro- 
genic differentiation in limb development (Zeller et al. 2009). 
We found apparent sequence similarity in CNEs for the limb en- 
hancer of bmp7 (Adams et al. 2007) and the limb enhancer (CNE1 1) 
of gli3 (Abbasi et al. 2010) between tetrapods and coelacanth, 
whereas such similarities were not observed in the corresponding 
genomic regions of ray-finned fishes (Fig. 6C,D). The greml limb 



Heterozygosity rate 

To investigate the genetic diversity of coelacanths, we additionally 
determined the entire genome sequences of three individuals (two 
from Tanzania and one from Comoro). We then estimated the rate 
of heterozygosity for each individual. The rates of heterozygosity of 
the coelacanth individuals from Tanzania, Comoro, and Indonesia 
were estimated to be 0.0023%-0.0024%, 0.0019%, and 0.0061%, 
respectively (Supplemental Table 20). Thus, the heterozygosity 
rates in the coelacanth individuals from the western Indian 
Ocean were significantly lower than that in the Indonesian in- 
dividual. Furthermore, the heterozygosity rate was the lowest in 
the Comoro individual (Supplemental Tables 20, 21). The lower 
heterozygosity rates in coelacanth individuals, as compared with 
those in human (0.069%) (Wang et al. 2008) and gorilla (0.076 to 
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Figure 4. Distribution of LINE clades among vertebrates. The presence 
and absence of the representative LINE clades are shown with a plus and 
minus sign, respectively, (a) These mammalian LINEs are not active at 
present, and only fossil sequences are found, (b) RTE and Vingi are distrib- 
uted only in restricted eutherian groups, possibly because of horizontal 
transfer events, (c) R2 elements are not reported in anole lizard but are 
known in turtle and bird genomes. 



Genome Research 1 743 



www.genome.org 



Nikaido et al. 



0.256 



-zebrafish 



0.123 



T 0 050 



-coelacanth 



0.169 



0.050 



0.099 



.chicken 



frog ^ 



0.132 



0.1 



-human 



t 



B 



TCC041-004 (Tanzania) 
S2 (Tanzania) 
TCC025 (Tanzania) 
Comoro 

Indonesia 



TCC02S 



Comoro 



0.00326% 
0.00318% 
0.00343% 
0.18263% 



(10.0%) 

(10.7%) 0.00311% 

(18.7%) 0.00336% 

(95.5%) 0.18268% 



(11.4%) 

(16.4%) 0.00342% 
(95.5%) 0.18267% 



(17.1%) 
(95.5%) 



0.18248% (95.7%) 



Figure 5. The evolutionary rate is significantly slow in the coelacanth lineage. (A) Phylogenetic tree of 
euteleostomes constructed with 5247 orthologous genes using the maximum-likelihood method. (B) 
Genetic divergence between L chalumnae individuals as well as between L chalumnae (TCC041 -004) and 
L menadoensis (20080806) individuals estimated using SNVs. Homozygous rates indicate the proportion 
of homozygotic SNVs among total SNVs. In total, 1,673,302,1 34 bp were used for the analysis. 



enhancer is also specifically conserved between tetrapods and coe- 
lacanth (Fig. 6E), as was previously reported (Zuniga et al. 2012). 



Chemoreceptor genes 

Finally we looked at olfactory receptor (OR) genes and pheromone 
receptor (V1R) genes (see Supplemental Information 7.1 for detailed 
strategy). Aquatic vertebrates such as teleost fishes and primitive 
sarcopterygians may sense nonvolatile (water soluble) chemicals, 
whereas tenestrial vertebrates such as mammals, reptiles, birds, and 
frogs mainly sense volatile (airborne) chemicals. In accordance with 
this expected functional transition, previous studies indicate that the 
repertoires of OR and VI R genes are highly differentiated between 
fishes and tetrapods (Niimura and Nei 2005; Saraiva and Korsching 
2007; Nei et al. 2008). Figure 7 A shows the neighbor- joining tree of 
VI R genes from a broad range of vertebrates including coelacanths. 
Most teleost fishes possess six distantly related VI R genes (fish-VlRl 
to fish-VlR6, in blue), whereas tetrapods possess more than 20 genes 
of closely related VI Rs (designated as tetrapod type; t-VIRs), which 
are nested within the clades of fish-VlRl and fish-VlR2 (Saraiva and 
Korsching 2007). This suggests that tetrapods have increased the V1R 
copy number through lineage-specific gene expansion, which might 
facilitate the adaptation to sense airborne chemicals. In the coe- 
lacanth genome, we found almost all of the fish-type V1R genes, 
which have been retained by the constraint of an underwater envi- 
ronment. Interestingly, however, multiple t-VIR genes (represented 
by the red triangle in Fig. 7A) were also discovered. The maximum- 
likelihood tree showed a similar result (Supplemental Fig. 25). Ac- 
cordingly, lineage-specific expansion of the t-VIR genes possibly 
occurred in a common ancestor of Sarcopterygii, as represented by 
the gene numbers in Supplemental Figure 26. Furthermore, the 
coelacanth OR genes underwent similar gene expansion. The a and 7 
subfamilies of OR genes are preferentially increased in tetrapod ge- 
nomes, implying that these receptors detect volatile chemicals in the 
terrestrial environment (Niimura and Nei 2005; Nei et al. 2008). 



Interestingly, in coelacanths, there was a 
substantial amplification in the number of 
OR genes belonging to subfamily 7 (more 
than 20 copies) (Fig. 7B; Supplemental Fig. 
27). Furthermore, OR genes belonging to 
subfamily a were also amplified in coela- 
canths (Fig. 7B; Supplemental Fig. 27). 

Discussion 

Comparison of TEs among vertebrates 

The proportions of coelacanth TEs were 
compared with those of other vertebrates 
(Supplemental Fig. 1 7; International Hu- 
man Genome Sequencing Consortium 
2001; Hillier et al. 2004; Mikkelsen et al. 
2007; Piskurek et al. 2009; Hellsten et al. 
2010; UCSC Genome Bioinformatics, 
http://genome.ucsc.edu/). Interestingly, 
a comparison of LINE distribution among 
vertebrates reveals the loss of diversity of 
LINE families in tetrapods (Fig. 4). For 
example, whereas Nimb clade LINEs are 
known in teleost fishes (e.g., zebrafish) 
and insects (e.g., silkworm and mos- 
quito), no obvious copy of Nimb exists in 
coelacanths or tetrapods (Fig. 4), which may suggest a shared loss 
of the Nimb family in the common ancestral lineage of Sarcopterygii. 
Thus, coelacanth LINEs show an intermediate feature in terms of 
their distribution among vertebrates (see Supplemental Informa- 
tion 5 for details). 

Figure 3 shows the amplification waves of TEs in the coelacanth 
genome, the oldest of which can be traced to >35% divergence from 
the consensus. This amount of divergence corresponds to about 
150 Ma in the cases of mammalian TEs (International Human 
Genome Sequencing Consortium 2001). As demonstrated in the 
present study, however, the coelacanth genomes exhibit a slow 
rate of substitution. Accordingly, 35% divergence in the coela- 
canth genome may correspond to an insertion event that oc- 
curred over 400 Ma. This suggests the very interesting possibility 
that we can elucidate phylogenetic relationships among tetra- 
pods, coelacanths, and lungfish using the retroposon method 
(Shedlock and Okada 2000). Because the divergence time among 
tetrapods, coelacanths, and lungfish is assumed to be around 400 
Ma, these times fall within the scope of the estimation determined 
by the retroposon method. In this regard, it will be interesting to 
search the lungfish genome for the presence of coelacanth retro- 
poson families that have been described here. If we discover the old 
retroposons in the lungfish genome that were amplified 400 Ma, the 
application of the retroposon method to these interesting phylo- 
genetic relationships may be feasible. 



The significantly slow rate of substitution in the coelacanth 
genome 

The present finding of the slow rate of nucleotide substitution 
in the coelacanth genome possibly offers insights into why the 
morphology of coelacanths has evolved so slowly over the past 400 
million years (Smith 1939; Forey 1988). Namely, slower nucleotide 
substitutions in coelacanth genes and/or enhancers may reduce 
the potential to alter the phenotypic traits. Although it is widely 
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Figure 6. Genetic signature of fin-to-limb transition inferred from a genome comparison among vertebrate species. (A) Model of fin-to-limb 
transition based on morphological and molecular features. Red and blue bars indicate the molecular and morphological evolutionary events, re- 
spectively. The black and gray areas of the drawings depict the internal skeletons and lepidotrichia, respectively. The skeletons of the pectoral fins or 
limbs of zebrafish, Tiktaalik, Acanthostega, and mouse (extant tetrapods) were modified from Schneider etal. (201 1 ). The skeleton of the pectoral fin of 
the coelacanth was drawn according to Millot and Anthony (1 958). (B) Alignment of the N-terminal conserved domain of and genes showing two and 
genes in the coelacanth genome (arrowheads). Completely and mostly conserved (three or fewer amino acid substitution events during evolution) sites 
are shown with black and gray backgrounds, respectively. The species are indicated as follows: five teleost fishes, zebrafish (Dre: Danio rerio), stickleback 
(Gac: Gasterosteus aculeatus), fugu (Tru: Takifugu rubripes), pufferfish (Tni: Tetraodon nigroviridis), and medaka (Ola: Oryzias latipes); spotted gar (Loc: 
Lepisosteus oculatus); coelacanth (Lch: L chalumnae); and elephant shark (Cmi: Callorhinchus milii). (C-f ) VISTA plots of c/'s- regulatory elements in six 
vertebrate species using mouse as the reference for the following loci: (C) bmp7 intron 1 enhancer; (D) CNE1 1 in intron 1 0 of gli3; and (£) HMCOI in the 
greml-fmnl locus. Lines indicate the degree of conservation from 50%-1 00%. The genomic regions estimated to be CNEs are shown by pink. 



accepted that phenotypic evolution and neutral DNA evolution 
are decoupled (Hay et al. 2008), the slow rate of evolution in mor- 
phology and DNA appears to be coupled in coelacanths. However, 
because the inbreeding or the bottleneck in each of the coelacanth 
populations of Tanzania, Comoro, and Indonesia could possibly 
lead to an inaccurate estimation of the genetic distances, we should 
still be cautious about conclusions in this regard. 

Difference in heterozygosity rates 

We have shown that the rates of heterozygosity are significantly 
different according to the coelacanth locality. Because the muta- 
tion rates are expected to be similar among extant coelacanth in- 
dividuals, the difference in heterozygosity rates could result from 
the demographic history of their populations (i.e., a reduction in 
population size or inbreeding). In particular, the lowest heterozy- 
gosity rate, which was found for the Comoro individual, implies 
the possibility of a population bottleneck. At present, the coelacanth 
populations in the Comoros archipelagos are threatened because 
of past overexploitation (Hissmann et al. 1998). Our previous 
mitochondrial analysis also showed a lower genetic diversity in 



the Comoros population than in the Tanzanian population 
(Nikaido et al. 2011). Thus, our present results prompt us to in- 
vestigate the population structures of coelacanths more com- 
prehensively by adding specimens currently available for mo- 
lecular research. 



Apparent similarities observed in genes and enhancers 
of Sarcopterygian genomes 

In the present study, we found an apparent sequence similarity in 
CNEs for the limb enhancer between tetrapods and coelacanth. 
From an evolutionary viewpoint, it is likely that these CNEs that 
emerged in the primitive sarcopterygians are essential for shaping 
tetrapod-like robust internal skeletons (Fig. 6A). Importantly, the 
emergence of these novel CNEs and the resulting robust internal 
skeletons in the primitive sarcopterygians could have been used 
initially for effective underwater swimming rather than for locomo- 
tion on land. It is likely that these CNEs were co-opted later dur- 
ing the water-to-land transition in primitive tetrapods, which was 
coupled to the loss of and genes. Furthermore, we revealed the 
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Figure 7. The evolution of chemoreceptor genes in coelacanths. (A) The neighbor-joining tree of 
vertebrate VI R genes. The evolutionary distances were computed using the JTT matrix-based method 
and are presented in amino acid substitutions per site as shown by the scale bar. Bootstrap values 
(10,000 replicates) >60 are shown in the tree. The species are indicated as follows: gray, cow (Bos 
taurus); green, frog (Xenopus tropicalis); red, coelacanth (L chalumnae); blue, five teleost fishes — 
zebrafish (D. rerio), stickleback (C. aculeatus), fugu (T. rubripes), pufferfish (T. nigroviridis), and medaka 
(O. latipes). (B) Comparison of the copy numbers of the OR genes belonging to subfamilies a and 7 
among vertebrates. Blue, yellow, and red bars indicate the genes that were annotated as intact, trun- 
cated (because of the incompleteness of the draft genome), and pseudogenes, respectively. 



amplification of tetrapod-type chemo- 
receptor genes (t-VIRs, a and 7 ORs) in 
the coelacanth genome (Fig. 7). Because 
primitive sarcopterygians inhabited an 
underwater environment, it is unlikely 
that t-VIRs or a and 7 ORs of those groups 
received airborne chemicals. Thus, we 
further speculate that the initial gene ex- 
pansion of these genes was not directly 
related to terrestrial adaptation but was 
subsequently co-opted in tetrapods. It is 
possible that the ancestral coelacanth lin- 
eage(s) once inhabited shallow water and 
then returned to open marine water. In 
that case, the presence of the robust in- 
ternal skeleton as well as t-VIRs in coe- 
lacanths may be the signature of adapta- 
tion for spending time above the surface 
and crawling near the shore. However, this 
scenario is unlikely for three reasons. First, 
the bodies of both extant and extinct coe- 
lacanths are very heavy, and it would be 
very difficult for them to lift their body 
against gravity. Second, their bodies are 
covered by armor scales, which are ex- 
pected to be unfit for dryness. Third, if 
the expansion of t-VIRs occurred because 
of the adaptation to detect airborne chem- 
icals in the ancestral coelacanth, these 
genes would have subsequently been 
pseudonized in extant coelacanths, which 
inhabited a deep marine environment. 
These genes are, however, still intact. 

It is proposed that some genes uti- 
lized for adaptation already existed before 
the emergence of novel fauna such as 
multicellular organisms (Miyata and Suga 
2001). We expect that some tetrapod-like 
genes already existed in the genomes of 
ancestral Sarcopterygii before the terres- 
trial adaptation in spite of the fact that 
these genes were not originally related to 
the terrestrial adaptation. Other exam- 
ples of genes that are expected to be 
critical for the water-to-land transition, 
such as those for hemoglobins, urea 
synthesis, ovoviviparity, and the swim 
bladder, are discussed in Section 7 of the 
Supplemental Information. 



Highlights in the present study 

Recently, Amemiya et al. (2013) published 
the L. chalumnae genome sequence. Here, 
we highlight several novel findings that 
were not provided in their analyses. 

(1) This is the first example of the appli- 
cation of PLATANUS, a newly devel- 
oped assembler, to the determination 
of an entire large eukaryotic genome 
using next-generation sequencing 
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data. A detailed explanation of the advantage of this applica- 
tion is provided in Supplemental Information 2.3. 

(2) The complete picture of the TEs in the coelacanth genome is 
presented, which might shed light on the long-debated issue 
regarding the phylogenetic position of the coelacanths among 
vertebrates using the retroposon method. 

(3) The whole-genome sequences of multiple coelacanth individuals 
from Tanzania, Comoro, and Indonesia, which were determined 
in the present study, enabled us to establish the extremely low 
value of genetic divergence among individuals and to estimate 
an unexpectedly slow rate of nucleotide substitution. In addi- 
tion, these data enabled us to determine the significant differ- 
ences in the genetic diversities among each population. 

(4) The possibility of co-option in the timing of terrestrial adapta- 
tion was presented. This phenomenon could be quite important 
for understanding the genetic origins of key innovations in 
natural history, providing the seed for discussions regarding the 
process of macro-evolution, including the fin-to-limb transition. 

(5) As an example of such discussions, the chemoreceptor genes in 
the coelacanth genome were completely analyzed, which pro- 
vides novel insight into the evolution of olfaction during the 
water-to-land transition. 

In summary, the coelacanth genome sequences provide a 
cornerstone to answer, at the DNA level, the long-debated and 
scientifically important questions regarding how vertebrates suc- 
cessfully adapted to terrestrial life. Another unexpected discovery 
derived from the whole-genome data is the extremely slow sub- 
stitution rate found within the coelacanth genome. In addition, 
our finding of the significantly lower heterozygosity rates of the 
coelacanths in the western Indian Ocean, compared to those of the 
Indonesian coelacanth, should promote conservation-related ge- 
netic studies (Fricke et al. 2011) to protect this "priceless heritage 
from the past" (Smith 1963) from extinction. 

Methods 

Coelacanth specimens and tissue samples 

All of the coelacanth specimens described in this study were acci- 
dentally caught by local fisherman and transferred later either to 
The Tokyo Institute of Technology; The University of Tokyo; or 
Aquamarine Fukushima, Marine Science Museum, under the reg- 
ulation of the Convention on International Trade in Endangered 
species of Wild Fauna and Flora. Detailed information about each 
specimen is provided in the Supplemental Material. L. chalumnae 
(TCC041-004, TCC025, and S2) were transferred from the Tanzania 
Fisheries Research Institute to the Tokyo Institute of Technology 
under the Memorandum of Understanding (M.O.U.) between the 
two institutions. Similarly, the Comoran specimen was obtained 
from the Center National de Documentation et de Recherche Sci- 
entifique, Musee National des Comores (CNDRS) to Aquamarine 
Fukushima, Marine Science Museum. Frozen muscle tissue from the 
Indonesian coelacanth L. menadoensis was given by Sam Ratulangi 
University to the University of Tokyo under the Cooperative 
Research Agreement between the two universities. Detailed in- 
formation about the specimens is listed in Supplemental Table 1. 

Genome sequencing and assembly 

Sequencing libraries were prepared using the Illumina TruSeq DNA 
Sample Prep kit (300 bp, 500 bp, and 1.0 kb) and the SOLiD Mate- 
Paired Library Construction kit (2.5 and 5.0 kb; Applied Bio- 
systems) according to the manufacturers' instructions. All libraries 
were sequenced on the Illumina HiSeq2000 sequencers. The raw 



sequence reads were filtered for the trimming of adapter sequences 
in reads and for the removal of paired reads with low-quality or 
extremely short insert sizes. Whole-genome assembly was per- 
formed with the newly developed assembler PLATANUS, which is 
optimized for short-read data from high- throughput sequencers. 
See Supplemental Information 2.3 for details. 

Genome browser 

The Coelacanth Genome Browser has been established using the 
assembled genome sequence and RNA-seq data, and genomic data 
sets used in this study are freely available online (http://coelacanth. 
nig.ac.jp/; Supplemental Fig. 16). This browser shows the SNV density 
of each individual, gene models, expression, repeats, comparative 
analysis, fosmid clone map, and coelacanth/human alignment using 
the Generic Genome Browser (GBrowser) (Stein et al. 2002). In ad- 
dition, the browser provides a sequence similarity search function 
against the coelacanth assembled genome and the gene model se- 
quences with blast/BLAT programs, and a keyword search function 
against gene symbols and definitions. Users can also download the 
entire data set described in this study, including the genome se- 
quence, LatChaJl.0, as well as predicted gene sequences CDS, 
CDS+UTR, protein sequences, gene structures, rRNAs, and ncRNAs. 
See Supplemental Information 4.7 for details. 

More details about the genome sequencing and assembly, 
RNA-seq, gene annotation, data mining, bioinformatic analyses, 
phylogenetic analyses, and micro-CT imaging are described in the 
Supplemental Methods and Supplemental Information. 

Data access 

All nucleotide sequence reads and the genome assembly have been 
deposited in the DDBJ SRA under BioProject PRJDB500 and 
LatCha_J1.0. (http://trace.ddbj.nig.ac.jp/dra/index.shtml). The 
fosmid sequences have been submitted to NCBI GenBank (http:// 
www.ncbi.nlm.nih.gov/genbank/) under accession nos. DH994576- 
DH995329, GA605430-GA720357, AP012980-AP012984, AP012992- 
AP012996. 
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